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MEDIA STREAM MIXING 
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The present invention relates to real time mixing of at least two media streams in a portable 
communication device. More particularly, it relates to a method and a device for real time 
mixing of at least two media streams, for providing a real time transmitted output media 
10 stream. 

DESCRIPTION OF RELATED ART 

Most third generation mobile terminals will have a Video Telephony (VT) application 
implemented, which is based on the 3GPP specification 324M. This VT-application enables a 
person-to-person connection with communication including real-time voice and video 
information. These applications comprise recording and generating of one single video 
stream containing both audio and image information. 

If it were possible to combine two different media streams in a portable communication 
device, a number of attractive functions could be provided such as exchanging voice from 
one stream with voice from another, replacing mage information of one stream with image 
information of the other etc. 

One interesting way of using VT would be to use so called "show and tell". This means that 
when playing a recorded video including audio and image information, voice (audio) is 
simultaneously added to this stream. 

Furthermore, some consumers might be concerned about apparently being filmed at 
30 locations where it may be inappropriate or where the other party is not allowed to see the 
location because of security reasons. The apparent real time movie may for instance be a 
combination of two different movies, one of which shows the location and the other shows 
the consumer. 
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This invention concerns how to Increase the usage of a mobile VT application and how to be 
able to handle integrity issues when used in a video phone. 

There is thus a need for a method and a device that can provide an output media stream 
that is based on two separate input streams, where at least one is a real-time stream. 
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SUMMARY OF INVENTION 



This invention is thus directed towards solving the problem of providing a real time 
synchronized output media stream being transmitted from a portable communication device 
where said output media stream is a mixture of a first (real time) media stream and a 
second media stream. 

This is achieved by providing generating in real time a first media stream in the portable 
communication device, and combining in real time the first media stream with a second 
media stream, for forming the output media stream. 

One object of the present invention is to provide a method for forming a real time output 
media stream made up of a first real-time media stream and a second media stream. 

According to a first aspect of this invention, this object is achieved by a method for forming 
an output media stream to be transmitted during a communication session from a portable 
communication device, wherein said media stream comprises signals of a first type 
comprising the steps of: 

generating In real time a first media stream in the portable communication 
device, and 

combining in real time the first media stream with a second media stream, 
for forming the output media stream. 

A second aspect of the present Invention is directed towards a method including the 
features of the first aspect, wherein said output media stream comprises signals of a second 
type. 

A third aspect of the present invention is directed towards a method Including the features 
of the first aspect, further comprising the step of transmitting said output media stream. 

A fourth aspect of the present Invention is directed towards a method including the features 
of the first aspect, further comprising the step of establishing a connection with another 
device. 

A fifth aspect of the present invention Is directed towards a method including the features of 
the fourth aspect, wherein said connection is a circuit-switched connection. 

A sixth aspect of the present invention is directed towards a method including the features 
of the first aspect, in which at least one of the steps is dependent on input data from a user 
of said portable communication device. 

A seventh aspect of the present invention is directed towards a method including the 
features of the first aspect, wherein the step of combining comprises combining signals of a 



WO 2005/004450 



3 



PCT/EP2004/006226 



first type from the first media stream with signals of a second type from the second media 
screa m . 



An eighth aspect of the present invention is directed towards a method including the 
features of the first aspect, wherein the step of combining comprises combining signals of a 
first type from the first media stream with signals of the first type from the second media 
stream . 

ofT h ?ZT ° f Pr6Sent ' nVenti0n ' S direCted tOWards a method inc '"d'ng the features 

sllnd 6 ? SPe S' ? ere ' n St6P ° f COmbininQ fUrth6r COmprises combinin 9 -Anal, of a 
second type from the first media stream with the signals from the second media stream. 

VT* , a tJT* ° f PreS6nt inVent, ° n ' S d ' reCted tOWardS 9 method includi "9 the features 
of the eighth aspect, wherein the step of combining further comprises combining signals 

from the first media stream with signals of the second type from the second media stream. 
An eleventh aspect of the present Invention is directed towards a method Including the 

I?,, 65 ,°L t6nth , aSP6Ct ' Wherein St6P ° f C ° mbln,ng further com P rise * combining 
signals of the second type from the first media stream with signals from the second media 

A twelfth aspect of the present invention is directed towards a method including the features 
of the eleventh aspect, wherein the step of combining further comprises the step of: 

delaying, prior to combining, signals of one type of the second media stream 
•n relation to the other type of signals of the same stream, for providing ' 
synchronized signals from the second media stream within the output media 
stream. 

A thirteenth aspect of the present Invention is directed towards a method Including the 

cnlT 5 ■ T°l aSP6Ct ' Wh6rein St6P ° f C ° mbining further com P rises independently 
combining signals of the first type and signals of the second type. 

A fourteenth aspect of the present invention Is directed towards a method Including the 
features of the ninth aspect or the eleventh aspect, wherein the step of combining further 
comprises delaying signals of one type within the output media stream, in relation to the 
other type of signals of the same stream, for providing synchronized signals from the first 
media stream within the output media stream. 

A fifteenth aspect of the present invention is directed towards a method including the 
features of the ninth aspect, wherein the step of combining signals, where the signals of the 
firs type are audio s.gnals, further comprises the step of superposing the signals of said 
nrst type. 
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A sixteenth aspect of the present invention is directed towards a method including the 
features of the fifteenth aspect, wherein the step of superposing comprises weighting 
properties of the audio signals from the first media stream and the second media stream. 

A seventeenth aspect of the present invention is directed towards a method including the 
features of the ninth aspect, wherein the step of combining signals, where the signals of the 
first type are image signals, further comprises the step of blending the signals of the first 
type. 

An eighteenth aspect of the present invention is directed towards a method Including the 
features of the seventeenth aspect, wherein the step of blending comprises weighting 
properties of the image signals from the first media stream and the second media stream. 

A nineteenth aspect of the present Invention is directed towards a method including the 
15 features of the sixteenth aspect, wherein weighting properties includes varying the 

proportion of signals from the first media stream in relation to the proportion of signals from 
the second media stream. 

A twentieth aspect of the present invention Is directed towards a method including the 
20 features of the nineteenth aspect, wherein the weighting properties is dependent on input 
data of a user of said portable communication device. 

A twenty-first aspect of the present invention is directed towards a method including the 
features of the nineteenth aspect, wherein the varying said proportions comprises varying of 
25 each proportion within the range between 0 and 100%. 

Another object of the present invention is to provide a portable communication device for 
forming a real time output media stream made up of a first real time media stream and a 
second media stream. 

30 

According to a twenty-second aspect of the present Invention, this object Is achieved by a 
portable communication device for forming an output media stream to be transmitted during 
a communication session from said portable communication device, wherein said output 
media stream comprises signals of a first type, said portable communication device 
35 comprising: 

a generating unit provided for generating a first media stream in real time, 
a first combining unit, connected to said generating unit, provided for 
combining in real time the first media stream with a second media stream 
40 and 

a control unit controlling the generating unit and the combining unit. 

A twenty-third aspect of the present invention is directed towards a portable communication 
device including the features of the twenty-second aspect, for forming an output media 
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stream to be transmitted during a communication session from said portable communication 
dev.ce, wherein the first combining unit is provided for combining signals of the first type of 
both the first and second media streams, wherein the output media stream comprises ' 
signals of the first type and a second type, further comprises: 

a second combining unit, 
for combining signals of the second type of the first media stream and signals of the second 

Tk 1 6 T med ' a StrCam f ° r Pr ° Viding the OUtput media stream comprising signals 
of both the first and second type. 

A twenty-fourth aspect of the present invention is directed towards a portable 
communication device including the features of the twenty-second aspect, further 
comprising: 

a memory unit for providing storage for the second media stream. 

A twenty-fifth aspect of the present invention is directed towards a portable communication 
dev,ce mcluding the features of the twenty-second aspect, further comprising: 

a user input interface for providing user input and connected to the control 
unit so that the generating unit and all combining units are controlled in 
dependence of user input. 

llTiTf 'h " ° f PreSent ' nVenti0n 15 direCt6d tOWards a <> ortable communication 

dev.ce includ.ng the features of the twenty-third aspect, further comprising: 

a multiplexing unit (220) for providing synchronization of signals of one type 
from the first media stream in relation to signals of the other type from same 
first media stream, within the output media stream. 

A twenty-seventh aspect of the present invention is directed towards a portable 
communication device including the features of the twenty-third aspect, further comprising 
further comprising: 

a delaying unit for providing synchronized signals within the output media 
stream. 

A twenty-eighth aspect of the present invention is directed towards a portable 
communication device including the features of the twenty-seventh aspect, where the 
delaying unit provides synchronization of signals from the second media stream, prior to 
combining with the first stream. ' P 

A twenty-ninth aspect of the present invention is directed towards a portable communication 
dev.ce includmg the features of the twenty-eighth aspect, where the delaying unit provides 
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synchronization of signals of one type in relation to signals of the other type from the same 
second media stream. same 



The present invention provides an output media stream where a first real-time media 
stream is combined with a second media stream. This has the advantage that the two media 
streams can be combined in a number of ways for providing a number of different attractive 
functions for example. 

A user of a mobile device can, instead of separately sending video camera images to a 
communicating party, transmit a pre-recorded video or sound, while mixing said pre- 
recorded video with real time voice or audio information in order to provide the 
communicating party the perception that the user is located at another place than he 
actually is. Such an effect can further be enhanced by mixing moving image information 
such as the face of the user, into said pre-recorded video. 

Upon receiving a video phone call the user can, instead of sending real time video camera 
images from his camera to the calling, decide to play a pre-recorded video answering 
message, containing moving or still pictures, stored in memory, in order to provide a mobile 
video answering machine. 

During a conversation a user of the communication device can share content information 
such as video or still images instantly by providing it in the output media stream of a VT- 
session. This can thus be used for allowing for simultaneous multimedia. 

Another example is the use of sending of a pre-recorded video file during start up of a VT 
session, where said file can contain advertisements, qualifying for reduced communication 
carirfs. 

It should be emphasized that the term "comprises/comprising" when used in this 
specification is taken to specify the presence of stated features, integers, steps or 
components, but does not preclude the presence or addition of one or more other features 
integers, steps, components or groups thereof. 



BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described in more detail in relation to the enclosed 
drawings, in which: 

fig. 1 shows a method for providing in real time a synchronized output media stream that is 
transm.tted from a portable communication device, where said output media stream is a 
mixture of a first real time media stream and a second media stream- and 
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DETAILED DESCRIPTION OF THE EMBODIMENTS 

The present invention relates to the forming a synchronized output media stream to h. 
transm,tted during a communication session from a portable comlnfca^ * 

The features according to this preferred embodiment will be available in the 3G 374M 
appi.cat.on of communication devices. In order to achieve this the flex b. J 0 t'h H 
protocol will be utilized together with updated SW/HW architecture " 

Reference will now be given to figs. 1 and 2, Illustrating a method for providing a 
synchrony output media stream In real time that is transmitted from a oonlL 

=^Z^Z and a portab,e — — ~- zz** 

a connection, 

device, with which the user of the portable oJZ^Z^^ZT* 
VT communication session. This is done thro. ,nh h, , establish a 

performed by an audio generating unit 206 and an image generating unit 208 bo h 

colT. ? T Unit 2 ° 4 - ThfS iS d ° ne ^-ughUrd! g ma e via ZTr* 

comprised In the image generating unit 208 and audkrvia a mJnnh™ 

the audio generating unit 206. m.crophone compnsed within 

According to this embodiment the first mprtia cf«> am ~~ 

information. Now, p ro v,d,n g of *e TcZTl Z ZZTo\T, ""^ 
obtaining said media stream from a memory un t 210 Ai'solhis I dlT^ 
of the contro, uni, depends on user inputs memo"! s ' ITs *ZTJ 

Z toT^Zrr" MmPrlSed ' ndUded * «* device 

demultiplexing unit 212. The demultlpiexing unit 212 parfo^s del„« T 3 
in order to obtain a format tnat is suitabie for ^ 'Ctt 

informal of E seme ^ "n SEl^SSE? 2»TT ^ °' C ° mb ' nl " 9 
streams no* contain decoded separated 
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Information respectively. 

The above mentioned demultiplexing and decoding of the different signal types, according to 
the present invention, consumes different amounts of time. In this preferred embodiment 
the image processing path requires more time than the audio processing path, which is the 
general case. For obtaining a subsequent output media stream comprising synchronized 
audio and image information from the second stream, audio information from this second 
stream is subjected to delaying, step 108, by a delaying unit 214. The amount of delay used 
is further determined by the control unit, based on the difference in time of the different 
processing steps. However, the amount of delay used is also dependent on any time 
difference between audio and image information of the first media stream. The delaying unit 
214, effective on the second media stream, hence also has to compensate for a portion of 
subsequent synchronizing by a multiplexing unit 220, effective on the output media stream 
which portion is due to any timing difference between audio and image information within ' 
15 the first media stream. 

The separated types of information, i.e. audio and image are now subjected to combining 
Aud.o information from both streams are combined, step 110, by a first combining unit 216 
A second combining unit 218 is similarly combining image information from both streams 

20 step 112. The combining of audio information is performed by superposing audio 

information of the first stream on audio information of the second stream. This combining 
further includes weighting the properties of the audio information from the first stream and 
the audio information from the second stream. This encompasses varying the proportion of 
audio information from one stream in relation to the proportion of audio information from 

25 the other stream. 

According to this preferred embodiment of the present invention the first combining unit 
216 includes coding of the combined audio information to a suitable format such as AMR. 

For image Information the combining unit 218 combines image Information from the first 
stream with image information from the second stream by a process called a-blending 
which is well known to a person skilled in the art and will therefore not be further discussed 
here. Th.s combining of image information however includes weighting properties of the 
image Information from the first stream and the second stream. Similar to the combing of 
audio information by the first combining unit 216 the weighting properties within the second 
combining unit 218 includes varying the proportions of image information from one stream 
in relation to the proportion of image information from the other stream. 

Weighting properties of audio and image Information is dependent on user input data 
40 obtained from the user via the user input interface 202. 

Moreover, according to said preferred embodiment the second combining unit 218 
comprises coding the combined image information to a suitable format, such as MPEG-4. 
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The steps of combining information of the same type, from the two different streams is now 
foiiowed by forming the output media stream, step 114, by the multiplexing unit 220. 

This multiplexing unit 220 further contains synchronizing capabilities in order to achieve 
nternal synchronizing between the two types of information from the first media stream 
i.e. to synchronize the audio with the image information from this stream This 
synchronizing takes into consideration any time difference between the audio information 
and the image information within the first stream. However, it also respects any time 
ZZT* b H tW6 T T 6 time reqUir6d f ° r aUd '° inf o™ ati ™ to pass the combining unit, on 
Z 1 t,me reqU ' red f ° r ima9e informati <>n '0 pass the combining unit on 

the other hand. These required durations will typically depend on the presence of audio 
and/or image information in the media streams being combined. 

Upon having formed the output media stream including information from the first media 
stream and synchronized information of the second stream, this combined multiplexed 
output stream is subjected to real time transmitting, step 116, by the transmitting unit 222. 

With reference to portable communication device as shown in fig. 2, it is seen that the 
control unit 204 is connected to all the other performing units, in order to control them 

ZZtTy T ' nPUt d9ta US6r lnPUt interfaCe 202 - The ste P <>f generating 

the f.rst med.a stream, step 104, providing the second media stream, step 106 and the 

steps of combining audio information, step no, and combining image information, step 
112, require user input data. 

Furthermore, in order to delay the correct type of information, either audio or image 
information, feedback signaling is included between the second combing unit 212 and the 
control unit 204 to adjust the delay subject to the correct type of information in the deling 



The invention can be varied in many ways for instance: 

TJlTr^ T eam C ° mPrlSe ° n,y ' mage ,nforma tlon, only audio information or a 
combination of both. Also the second media stream can comprise only image information 
only audio information or a combination of both. All these different variations of the first and 
second med.a streams can be combined. The memory unit can be either fixed in the device 
or be an easily replaceable unit, such as a memory stick or another memory unit that is 
connectable to or insertable in the portable communication device. 

Image information can furthermore be provided within the first or second stream, as moving 
pictures, or a combination of both still pictures and moving pictures. 

Processing of audio information from the second media stream can be more time consuming 
han processing of Image information from the same stream, which means that image 
information of the second stream needs to be delayed in relation to the audio information in 
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order to obtain an output media stream containing synchronized information. 

The second media stream may furthermore contain audio and image information coded by 
using any of a large number of different codes. The first and second combining units may 

m h T, 0rB h9Ve COdin9 Capab " ities to encode the superposed audio information and the 
blended image information in a large number of different formats. 

firsTmldi t 7 n ° th r emb ° diment ' the flrSt media stream «• Prided as a single multiplexed 
first med.a s ream from a singie media stream generating unit. In this case, an additional 
umt, a demultiplexing unit, Is needed to demultiplex this multiplexed media stream, prior to 
the steps of combining audio information and image information separately. 

Another possible variation is to execute the steps according to the method in a different 

It is furthermore possible to form an output media stream from more than two media 
streams as well as to form an output media stream having information of more than two 
different types. It is furthermore possible form an output media stream by combining 
information from multimedia streams. 

According to yet another embodiment of the present invention a first and a second 
real time media stream are combined. In this case the second media stream is provided to 
the portable communication device in real time. One example of this embodiment is 
combining one real time media stream from a camera mounted for instance on the front of 
a portable device with another real time media stream from a camera mounted for instance 
on the back side of the same portable device. Holding the portable device in one's hand with 
a stretched out arm standing for instance in front of a sight-seeing spot, with the two 
cameras directed in different or opposite directions, enables one to combine the media 

fud,r,nH? alnin9 r'' 0 , an ima9S information ' of oneself with the second stream containing 
audio and image of one's current location, i.e. the sightseeing spot. It is thus easy and 

^Z^^Tr °H neSe,f 3 T time Stream C ° nta,nin9 aUd '° and ' mage tion, 
without the need of finding a second person for assistance. 

With the present invention has thus been described a method and a device for forming a 
real time output media stream by mixing a first media stream with a second media stream. 

The provision of mixing of media streams provides a number of attractive functions, for 
instance. 

A user of a mobile device can, instead of separately sending video camera Images to a 
communicating party, transmit a pre-recorded video or sound while mixing said pre- 

t r h?f,T,h V J de ° 1? V ° iCe ° r aUdi ° inf ° rmation - As this m "«ng is performed in real time "on 

n t th ^ TT L^^ mi9ht 96t impression that user is In another location 
than he actually is, like for instance on a luxurious vacation resort. 
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This effect can be furthermore enhanced by mixing moving Image information, such as the 
face of the user, into said pre-recorded video. 

This is also applicable in other situations, when for Instance the communicating party is not 
allowed to see the location for security reasons. 

Upon receiving a video phone call the user can, instead of sending real time video camera 
^ages from his camera to the calling party, decide to piay a pre-recorded video anler ng 
message, . containing moving or still pictures, stored in memory. This feature can thus used 
turn " 1 n °' , ' e i V,de0 H anSweri ^ machine. This can be useful since the user may not want to 
turn on his hve v.deo camera, when answering a VT call but still have the possibility of 
receiving pictures from a calling party. wwwircy or 

sul n L a J;r erSa T 3 ° f COmmunlcation dev "e «n share content Information 
such as video or st.il .mages instantly as a bearer for exchanging media files, allowing for 
simultaneous multimedia. 8 r 

Sending of a pre-recorded video file during start up of a vT session, where said file can 
contain advertisements, qualifying for reduced communication tariffs. 



